75 research outputs found

    NIL Is not nothing: Recognition of Chinese network informal language expressions

    Get PDF
    Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “�(ou3) ” is used to replace “�(wo3) ” in Chinese ICQ. Without unconventional resource, knowledge and techniques, the existing natural language processing approaches exhibit less effectiveness in dealing with NIL text. We propose to study NIL expressions with a NIL corpus and investigate techniques in processing NIL expressions. Two methods for Chinese NIL expressio

    Dataless text classification with descriptive LDA

    Get PDF
    Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios

    Commonsense knowledge enhanced memory network for stance classification

    Get PDF
    Stance classification aims at identifying, in the text, the attitude toward the given targets as favorable, negative, or unrelated. In existing models for stance classification, only textual representation is leveraged, while commonsense knowledge is ignored. In order to better incorporate commonsense knowledge into stance classification, we propose a novel model named commonsense knowledge enhanced memory network, which jointly represents textual and commonsense knowledge representation of given target and text. The textual memory module in our model treats the textual representation as memory vectors, and uses attention mechanism to embody the important parts. For commonsense knowledge memory module, we jointly leverage the entity and relation embeddings learned by TransE model to take full advantage of constraints of the knowledge graph. Experimental results on the SemEval dataset show that the combination of the commonsense knowledge memory and textual memory can improve stance classification

    Learning user and product distributed representations using a sequence model for sentiment analysis

    Get PDF
    In product reviews, it is observed that the distribution of polarity ratings over reviews written by different users or evaluated based on different products are often skewed in the real world. As such, incorporating user and product information would be helpful for the task of sentiment classification of reviews. However, existing approaches ignored the temporal nature of reviews posted by the same user or evaluated on the same product. We argue that the temporal relations of reviews might be potentially useful for learning user and product embedding and thus propose employing a sequence model to embed these temporal relations into user and product representations so as to improve the performance of document-level sentiment analysis. Specifically, we first learn a distributed representation of each review by a one-dimensional convolutional neural network. Then, taking these representations as pretrained vectors, we use a recurrent neural network with gated recurrent units to learn distributed representations of users and products. Finally, we feed the user, product and review representations into a machine learning classifier for sentiment classification. Our approach has been evaluated on three large-scale review datasets from the IMDB and Yelp. Experimental results show that: (1) sequence modeling for the purposes of distributed user and product representation learning can improve the performance of document-level sentiment classification; (2) the proposed approach achieves state-of-The-Art results on these benchmark datasets

    BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

    Full text link
    Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens. To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction

    Green finance strategies for mitigating GHG emissions in China: Public spending as a new determinant of green economic development

    Get PDF
    In order to lessen China’s carbon footprint, the government has turned to environmentally friendly financing. A reduction in CO2 has been reported in some Chinese provinces where green finance has been developed. Numerous regions in China from 2010 to 2020 are selected for this study. Based on a Dynamic Seemingly Uncorrelated, fully modified ordinary least squares and dynamic ordinary least squares regressions model, empirical research is performed with per capita growth in the economy, public spending, and the relationship between economic growth, human resources, and industrial arrangement as core variables to test the influence of green financing on CO2 emission in Chinese provinces. According to the findings, green financing speeds up the reduction of carbon emissions. Moreover, the outcomes present that industrial structure, economic growth per capita, and trade openness increase carbon emissions. Likewise, public expenditures and human capital are significantly contributing to emissions reduction. The findings show that sustainable green environment can only be achieved by boosting the performance of green finance and increasing the level of green finance supported by the Chinese economy. Last but not least, policymakers should promote public health and education spending to fully engage in the protection of the environmental efforts to encourage green consumption while minimizing the structural problems resulting from economic activity

    Application of computer software technology in oilfield geological exploration and development

    No full text
    Computer technology as the representative of the twelfth century advanced technology, in people’s life and production play a more and more important role. Especially in this link of energy development, its advantage is increasingly important and obvious. Petroleum is a part of China’s energy resources framework. The importance of its exploration work is self-evident. In oilfield geological exploration, the application of computer software technology is an important means to improve the utilization rate of energy resources in China. This article delves into the problem. And the problems it faces, and provide reasonable solutions
    • …
    corecore